|
|
Thorsten Froehlich wrote:
> In article <41163f3c$1@news.povray.org> , Nicolas Calimet
> <pov### [at] freefr> wrote:
>> Interesting. But also surprising (to me).
>> Could you explain why it takes an order of magnitude longer to
>> jump to a function via a pointer as compared to a direct reference ?
>> (note: I'm not very knowledgeable in low-level programming, just a
>> tiny idea of some assembly instructions).
>
> It should not at all.
>
Hmm... After this question, I looked furter into the issue.
First of all, quoting the GCC info:
---------------------------------------------------------------------
Note that you will still be paying the penalty for the call through a
function pointer; on most modern architectures, such a call defeats the
branch prediction features of the CPU. This is also true of normal
virtual function calls.
---------------------------------------------------------------------
But this cannot account for the huge difference I measured.
And actually, my second posting on the issue must be considered partly wrong
as well. Because it turns out that GCC will now also inline functions which
are declared extern _and_ appear further down in the code than the
calling location -- even when marked with __attribute__((noinline)) !!
[GCC 3.4.2 20040724 (prerelease); Seems I need to file a bug report...]
And since I did not verify that all these 3 precautions would successfully
prevent the compiler from inlining the code, I actually measured the
time difference between an extern and an inline call which clearly yields
to a difference in speed.
Okay, so let's do some really clean benchmarks this time - finally.
Oh dear. Maybe could anybody do some independent tests concerning that
issue? Because I will now tell you that calling an external function in
an external library is actually _faster_ than calling it directly in the
code when certain compiler flags are used.
I attached my test code for review.
So here are the timings:
Function call | OPT1 | OPT2
-------------------+-------+-------
int_foo(44.0); | 3.95s | 3.58s
(*int_fooP)(44.0); | 3.57s | 3.46s
(*ext_fooP)(44.0); | 3.57s | 4.13s
-none- | 0.37s | 0.37s
OPT1 = -ffast-math -O2 -fno-rtti
OPT2 = -ffast-math -O2 -fno-rtti -march=athlon-xp
All these values have been repeatedly measured up to +-1 in the last
digit specified - the differences are significant.
Hence, I think we can conclude, that there is no overhead for an
dynamically-linked external library function call.
[At least until somebody proves that something went wrong... :| ]
I also verified the case where the external library is calling back
into the main code: There is no real difference again.
Wolfgang
Here are the generated assembler instructions in all measured
cases:
----------<OPT1>------------<*ext_fooP>---------<OPT2>----------------
.L7: | .L7:
movl $0, (%esp) | movl $0, (%esp)
movl $1078329344, %eax | movl $1078329344, 4(%esp)
movl %eax, 4(%esp) | call *%esi
call *%esi | ffreep %st(0)
fstp %st(0) | decl %ebx
decl %ebx | jns .L7
jns .L7 |
----------------------------<*int_fooP>-------------------------------
.L7: | .L7:
movl $0, (%esp) | movl $0, (%esp)
movl $1078329344, %eax | movl $1078329344, 4(%esp)
movl %eax, 4(%esp) | call *%esi
call *%esi | ffreep %st(0)
fstp %st(0) | decl %ebx
decl %ebx | jns .L7
jns .L7 |
----------------------------<int_foo()>-------------------------------
.L7: | .L7:
movl $0, (%esp) | movl $0, (%esp)
movl $1078329344, %eax | movl $1078329344, 4(%esp)
movl %eax, 4(%esp) | call int_foo
call int_foo | ffreep %st(0)
fstp %st(0) | decl %ebx
decl %ebx | jns .L7
jns .L7 |
-----------------------------<-none->---------------------------------
.L7: | .L7:
decl %eax | decl %eax
jns .L7 | jns .L7
---------------------------------^------------------------------------
Here are the test programs:
---<Makefile>---------------------------------------------------------
MAINFLAGS = -ffast-math -O2 -fno-rtti
LIBFLAGS = -ffast-math -O2 -fno-rtti
#MAINFLAGS = -ffast-math -O2 -fno-rtti -march=athlon-xp
#LIBFLAGS = -ffast-math -O2 -fno-rtti -march=athlon-xp
all:
g++ $(MAINFLAGS) -DMODULE=0 -DMAIN -c dl.cc -o dl.o
g++ $(MAINFLAGS) -DMODULE=0 -DFOO -c dl.cc -o foo.o
g++ $(MAINFLAGS) -o test dl.o foo.o -rdynamic -ldl -lm
g++ $(LIBFLAGS) -nostartfiles -shared -DMODULE=1 dl.cc -o foo.so
time ./test
asm:
gcc $(MAINFLAGS) -fno-exceptions -DMODULE=0 -DMAIN -S dl.cc -o dl.S
gcc $(MAINFLAGS) -fno-exceptions -DMODULE=0 -DFOO -S dl.cc -o foo.S
------------------------------------------------------------------------
---<dl.cc>--------------------------------------------------------------
// dl.cc - Written by Wolfgang Wieser.
#if MODULE==0
//------------------
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
#include <string.h>
#include <errno.h>
#include <sys/mman.h>
extern "C" double int_foo(double x) __attribute__((noinline));
#ifdef FOO
double int_foo(double x)
{
//fprintf(stderr,"int_foo\n");
return(x);
}
#endif // FOO
#ifdef MAIN
int main()
{
void *hdl=dlopen("./foo.so",RTLD_NOW | RTLD_LOCAL);
if(!hdl)
{ fprintf(stderr,"dlopen: %s\n",dlerror()); exit(1); }
dlerror();
void *sym=dlsym(hdl,"ext_foo");
const char *err;
if((err=dlerror()))
{ fprintf(stderr,"dlsym: %s\n",err); exit(1); }
double (*ext_fooP)(double)=(double (*)(double))sym;
double (*int_fooP)(double)=&int_foo;
// These make the assembler easier to compare because it prevents
// function pointers from getting optimized away as "unneeded
// variables".
int_foo(23.0);
(*ext_fooP)(23.0);
(*int_fooP)(23.0);
for(int i=0; i<0xfffffff; i++)
{
//int_foo(44.0);
//(*int_fooP)(44.0);
(*ext_fooP)(44.0);
}
return(0);
}
#endif // MAIN
#else // MODULE!=0
//------------------
#include <stdio.h>
extern "C" double ext_foo(double x)
{
//fprintf(stderr,"ext_foo\n");
return(x);
}
#endif
------------------------------------------------------------------------
Post a reply to this message
|
|